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Abstract: Reconstructed mass variables, such as M 2 , M 2 C, and Mj^, play an 
essential role in searches for new physics at hadron colliders. The calculation of these 
variables generally involves constrained minimization in a large parameter space, which is 
numerically challenging. We provide a C++ code, Optimass, which interfaces with the 
Minuit library to perform this constrained minimization using the Augmented Lagrangian 
Method. The code can be applied to arbitrarily general event topologies, thus allowing the 
user to significantly extend the existing set of kinematic variables. We describe this code, 
explain its physics motivation, and demonstrate its use in the analysis of the fully leptonic 
decay of pair-produced top quarks using M 2 variables. 
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1 Introduction 

The CERN Large Hadron Collider (LHC) has successfully completed the decades-long quest 
to discover the particles of the Standard Model (SM) by finding the Higgs Boson [1, 2]. 
The paramount question in the current Run 2 of the LHC is whether the LHC can reach 
the relevant energy scale to discover physics beyond the standard model (BSM). Popular 
frameworks for new physics such as Supersymmetry (SUSY) [3-6] and Universal Extra 
Dimensions (UED) [7, 8] predict: 
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1. The presence of particles, such as neutralinos or KK-photons, that are not recon¬ 
structed in the detector, and are hence termed “invisible”. In general, the production 
of these particles will lead to “missing transverse energy”^ (MET) in an event. 

2. Relatively complex decay topologies, in which pair-produced, generally colored, par¬ 
ticles undergo several subsequent decays. Each “decay chain” thus produces one or 
more visible particles, as well as at least one invisible particle. 

Searches for new particles that produce such long decay chains in combination with a 
MET signature are complicated by large backgrounds from tt, W, and Z production, often 
with additional jets from initial state radiation (ISR). The severity of these backgrounds in 
multijet and multilepton channels increases with the collider energy. Even if a signal of such 
new physics is seen, the corresponding measurements of particle properties such as masses, 
couplings, and spins, are highly nontrivial [9-11]. Therefore, sophisticated procedures must 
be used to separate signal from background and to extract the quantum numbers of the 
new particles. 

In a given event, one observes some number of “physics objects” which correspond 
to physical particles that have produced appropriate energy deposits in the tracker and 
calorimeters of the detector; we will refer to these objects as “visible particles”. We shall 
denote their measured four-momenta by Pj, where j is the visible particle label. At the 
same time, the existence of a non-vanishing MET in the event indicates the presence of 
some number of additional, invisible, particles with four-momenta where k now labels 
the invisible particles. The individual momenta are not measured, and the only available 
piece of information is the missing transverse momentum 

fr ^'^qTk = - (IT) 

k j 

The MET, IpT, is then simply the magnitude of the missing transverse momentum vector: 

$t^Wt\- ( 1 - 2 ) 

The essential question is how to take these sets of measured four-momenta, {Pj}a, (one for 
each event a) and determine whether the events are produced purely by SM processes or 
whether new physics is at work. The standard procedure is to construct some kinematic 
variable, v, and compare the v distribution predicted by the SM to the data. In choosing 
a suitable variable, v, one typically follows one of these approaches: 

1. The variable, v, is an analytic function of some, but not all, of the measured momen¬ 
tum degrees of freedom. This is the preferred approach in inclusive analyses, where 
one targets a specific subset of the event. For example, in an inclusive search for a vis¬ 
ibly decaying resonance, X, one would select only the momenta of the hypothesized 
decay products ji,j 2 , and form the invariant mass of the resonance X, 

Mx = {Pj, +Pj 2 + , (1-3) 

^Or more precisely, missing transverse momentum. 
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leaving out all other objects in the event. ^ In the case of missing energy events, 
the situation is much more complicated — we cannot reconstruct the mass of the 
resonance as in eq. (1.3), since we have not measured the momenta, q^, of the invisible 
decay products. Then, one typically tries to form a variable which correlates with the 
scale of Mx- Various candidates have been tried, including the transverse momentum 
of the hardest object of a given type (lepton, jet, etc.) [12, 13], the scalar pj’ sum of the 
four hardest jets (or of all jets) [14], the jet multiplicity [15, 16], the “fat” jet mass [17], 
the “contransverse mass”, Mqt [18-20], the lepton energy [21-23] or lepton energy 
ratios [24, 25], and many more. The advantage of such techniques is their simplicity 
and robustness — they do not involve too many theoretical assumptions, making 
them ideal for model-independent searches for new physics. At the same time, they 
appear to be suboptimal, since they do not utilize the full set of measured degrees 
of freedom, leading to a certain loss of information. It is also rather challenging to 
assign a proper physical meaning to a kinematic variable which only uses such partial 
information (for more detailed discussion, see refs. [14, 26, 27]). 

2. The variable, v, is an analytic function of some measured momentum degrees of free¬ 
dom and the measured The explicit inclusion of the measured IpT in the definition 
of V was the next attempt to design a better performing class of variables. Perhaps 
the best known example is the W transverse mass [28, 29], where one identifies the 
transverse momentum of the missing neutrino with the measured ^t- Other possi¬ 
bilities include the “effective mass”. Mg// [14, 30], the VSmin variable [31-33], and 
the “razor” variables [34-37]. The outputs of neural nets, boosted decision trees, and 
other multivariate analyses [38], particularly those involving some form of machine 
learning, are also variables in this class. Incorporating the measured ^Ti which is 
often a sensitive variable all by itself [39, 40], into the definition of the kinematic 
variable, v, is certainly a step in the right direction. 

3. The definition of the variable, v, involves all measured momentum degrees of freedom 
and the individual invisible momenta, q^. Finally, one may construct the variable, v, 
so that from the onset it has explicit dependence on the individual invisible momenta, 
q^. The advantage of this approach is that one works with theoretically motivated 
quantities with clear physical meaning [26]. The obvious downside is that the indi¬ 
vidual invisible particle momenta, q^, are unknown, and something must be done to 
fix their values in the calculation. There are two possible alternatives: 

• Integrate over all possible values of the invisible momenta. Perhaps the simplest 
solution is to allow all possible values of the invisible momenta, q^, which are 
consistent with the measured missing transverse momentum (1.1), and compute 
the variable, v, as a suitably weighted average. A celebrated example of this 
approach is the Matrix Element Method (MEM) [41-43] which is finding in¬ 
creased use in hadron collider physics [44-53]. However, the method is often too 

^In forming the variable (1.3), one also ignores, e.g., additional angnlar information. 
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computationally challenging, requiring novel ideas and approaches [54-56]. Ad¬ 
ditionally, it is generally difficult to incorporate “reducible” backgrounds, which 
consist of events where the reconstructed particles are misidentified and/or their 
momenta significantly mismeasured. 

• Use a physically motivated ansatz for the invisible momenta. Alternatively, in¬ 
stead of considering all possible values of the set of invisible momenta, { 9 ^}, one 
could fix them by following some prescription specified in advance. With this 
approach, one gives up on trying to “guess” the correct values of the invisible 
momenta, and instead focuses on constructing a useful variable, v, whose prop¬ 
erties can reveal important information about the underlying physics. Examples 
of such variables include the Cambridge Mt 2 variable [57, 58] and its variants 
[59-61], and the variables M 2 C [62, 63], Mct 2 [64, 65], Mf [66], [67], 

and Mmin [68]. The variables in this class are often specified, not by analytic 
formulae, but by the algorithm used to calculate them^. 

Our focus in this paper will be on the algorithmically specified variables from the very 
last category, which are known to possess several attractive features: 

1. They are “maximally constraining” [26, 76] in the sense that, on an event per event 
basis, they provide the best possible lower bound on an invariant mass quantity 
of interest, such as the parent masses or the center-of-mass energy, v/j. This is 
particularly useful in cases where it is not possible to determine the actual values of 
that quantity due to incomplete event information. 

2. Their kinematic distributions exhibit sharper endpoints which are easier to measure 
over the SM backgrounds [77], leading to a more precise determination of the new 
physics mass spectrum. 

3. Certain measurements of their properties can be used as a self-consistency check on 
the assumed signal event topology [77, 78]. If the check fails, our conjecture about 
the event topology is falsified, thus narrowing down the allowed set of possibilities. 

At the same time, such algorithmically defined variables are notoriously difficult to 
compute. The algorithmic procedure typically involves calculating a mass for a set of 
hypothesized particles in an event (possibly after projecting their momenta onto the trans¬ 
verse plane), and minimizing the value of that mass with respect to all invisible momenta, 
q^. What makes the problem particularly challenging, however, is the presence of addi¬ 
tional non-linear mass-shell constraints. In essence, we are faced with a multidimensional 
constrained optimization problem, where our objective function is an energy function which 
is to be minimized. Given the importance of the maximally constraining invariant mass 
variables for new physics searches and measurements [26, 76-78], it is important to have 
publicly-available software for performing constrained minimizations to calculate kinematic 

®For some of the variables, analytical formulas may exist in certain special cases [69-75], but not in 
general. 
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variables from high energy collision data with sufficient generality and efficiency. The exist¬ 
ing packages described in the literature are typically only applicable to a specific variable, 
e.g., Mt 2 [58, 79-81], or [67] and cannot be readily generalized to the whole class of 
on-shell constrained variables [76, 77]. The standard approach is to try to solve the con¬ 
straining equations, thus reducing the unknown number of degrees of freedom (d.o.f.), then 
implement an unconstrained minimization over the remaining d.o.f. While this approach 
generally provides the most efficient algorithm for a specific event topology, it is not ex¬ 
tendable to more general event topologies, where not all constraints can be simultaneously 
solved analytically. 

In this paper, we describe an alternative approach that is sufficiently universal and 
flexible, and can be applied to arbitrarily general event topologies. The main idea is 
to use the Augmented Lagrangian Method (ALM) [82, 83], briefly described below in 
section 2.3. In this approach, the feasibility (i.e., the validity of the constraints) is ensured 
by penalizing infeasibility by adding “penalty terms” to the objective function, rather 
than by directly solving the constraining equations. The fact that the method does not 
require the solving of any constraints beforehand makes it very flexible and applicable to 
a very general class of event topologies. Of course, we still have to perform a standard 
unconstrained minimization, for which we can take advantage of any one of the many 
publicly available packages — we have chosen to use Minuit [84], which is widely popular 
in high energy physics. We also supply a comprehensive software package, Optimass"^, in 
the form of a library, which interfaces with Minuit to perform the constrained minimization 
of a user-specified kinematic function using the ALM. Appendix B contains instructions 
on the installation and usage of Optimass. 

The paper is organized as follows. In section 2, we review the general problem of 
constrained optimization with special emphasis on the motivation and the techniques used 
by Optimass, in particular the ALM. The relevant Minuit routines with which it inter¬ 
faces are described in Appendix A. Section 3 describes in detail the algorithm behind the 
Optimass package and presents several toy examples for its validation. In section 4 we 
briefly review the M 2 variables [26, 76, 77], and provide examples of their use, in the study 
of the fully leptonic decays of pair-produced top quarks. We use this as an opportunity to 
compare the results from OPTIMASS with previous studies and known analytical calcula¬ 
tions. Finally, section 5 is reserved for our conclusions and a brief discussion of the future 
of Optimass. 

2 Review of constrained minimization 

While many excellent textbooks and references discuss the optimization techniques that 
we will utilize (see, e.g., refs. [85-87] and references therein), we feel that a brief review 
of the elements of optimization theory relevant to the operation of Optimass will prove 
useful. We first note that while we will sometimes speak of “optimization” rather than 
“minimization”: (i) in the calculation of kinematic variables we will be interested only in 
minimization, and (ii) the methods used to find a maximum are, up to obvious changes 

^For OPTImization of MASS variables. 
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in the sign of certain parameters, identical to those used to find a minimum. So in what 
follows we will not worry much about this distinction. 

The first issue that will concern us is the important division of minimization problems 
into two types: (i) constrained and (ii) unconstrained. In constrained minimization, we 
want to minimize an objective function. 


fix) 


( 2 . 1 ) 


subject to a set of m constraints 


Ca{x) = 0, a = 1, ...,m, 


( 2 . 2 ) 


where x, in general, refers to a point in M” formed from some unknown momentum degrees 
of freedom xi, X2 ,..., x„. In what follows, we shall assume that the number of constraints, 
m, is always less than the number of independent degrees of freedom, n, so that we are 
dealing with a true minimization problem. 

If the constraints in eq. (2.2) are all independent, then the parameter space is effectively 
reduced from n dimensions to n—m dimensions. Sometimes this reduction can be performed 
analytically. For example, consider an optimization problem in M^, subject to the constraint 
x^ + = 1 — then we should parameterize the two dimensional subspace that satisfies 

the constraints, i.e., the surface of the unit sphere, S 2 , by the angles 6 and (p in the standard 
way. 

However, this reduction of dimensionality cannot always be performed analytically. A 
useful alternative procedure, therefore, is to turn the constrained minimization problem into 
the problem of an unconstrained minimization^ of a modified objective function, /(x), 
over the full, unconstrained, parameter space, M"'. This new problem can then be solved 
by setting the gradient of some function equal to zero, or by searching for a local/global 
minimum using one of the many standard numerical algorithms conventionally used for 
this purpose. 

When performing this unconstrained minimization iteratively, one develops an algo¬ 
rithm for finding the location of the minimum of /(x), x*, which is also referred to as the 
minimizer. At each iteration, one starts with some initial estimate, x^, (typically taken to 
be the minimizer of the previous, /c — 1®*, iteration), then refining this estimate by obtaining 
a new minimizer, x^+i, in some prescribed way, until certain convergence criteria are met. 
Since we have not analytically solved the constraints (2.2), the estimates, x^, will not, in 
general, satisfy the constraints exactly. Following the standard mathematical terminology, 
we shall refer to values of x that satisfy the constraints in eq. (2.2) as feasible. The ab¬ 
solute value of Ca{x) is then a measure of the feasibility®. Even though feasibility is not 
strictly guaranteed, the ultimate solution found by the method should nevertheless be such 
that the constraints are satisfied to within a required degree of numerical precision. 

®Or possibly, a series 0 /unconstrained minimization problems. 

® A point X in the unknown momentum space R" is feasible if it is an element of the feasible set Q defined 


by 


= { a; I Ca{x) = 0, a = 1,.., m} . 
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In the remaining three subsections of this section, we discuss three possible ways to 
transform a constrained minimization problem into an unconstrained minimization prob¬ 
lem, namely (i) the method of Lagrange multipliers (in section 2.1), (ii) penalty methods 
(in section 2.2), and (hi) the Augmented Lagrangian Method (in section 2.3). We shall 
see how penalty methods solve some of the problems associated with the use of Lagrange 
multipliers, while the ALM, in turn, resolves certain numerical issues related to the use 
of penalty methods. Of course we also must be able to solve the resulting unconstrained 
optimization problem; we discuss approaches to this challenge in Appendix A. 


2.1 The method of Lagrange multipliers 

As noted above, in a generic constrained minimization problem we are looking for the 
minimum value of the target function, f{x), subject to the constraints (2.2): 


f(x*) = min f(x) such that Ca=i mix*) = 0. 


(2.3) 


Alternatively, one is trying to find the location of the minimizer, x*, in M”. We note that 
in practical applications of the maximally constraining invariant mass variables both the 
minimum value of the function, f{x*), and the minimizer, x*, itself can serve a useful 
purpose. For example, in the Mr 2 -assisted on-shell (MAOS) reconstruction method, the 
minimizer is used to provide an ansatz for certain transverse components of the invisible 
momenta [88, 89]. Both the function, f{x), and the constraints, Ca(x), are assumed to be 
smooth^ real-valued functions in M"". 

The method of Lagrange multipliers provides necessary and sufficient conditions for 
finding local solutions of the minimization problem (2.3) above. In this method, we define 
a corresponding Lagrangian, henceforth denoted by £, for an objective function, /(x), and 
constraints, Ca(x), by 


Cix,X) = fix) - ^ AaCa(x), (2.4) 

a=l 


where A = (Ai, A 2 ,..., Am) is an m-component vector® of Lagrange multipliers, Aq. We 
now describe the conditions that must be satisfied in this method in order to establish the 
existence of a local minimum at the proposed minimizer, x*. 


First order condition (FOC). In unconstrained optimization, a necessary condition for 
the existence of a local minimum of / at x* is that the gradient vector, Vxfix*), vanishes. 
As this condition involves the gradient, we may term it a first order condition. In the 
method of Lagrange multipliers, an analogous condition holds for the existence of a local 
minimum. Here it is only necessary that the gradient of the objective function, /, at x* 
is orthogonal to the surface defined by the constraints. This condition will hold if the 

^In some cases, the objective function may depart from smoothness in a specihc way; we discuss this 
point and related issues in sections 3.2.2 and 4.1, see also ref. [77]. 

® Throughout this paper we shall use the notation v for n-dimensional vectors in the space of independent 
variables, R"', and v to denote m-dimensional vectors in the space of constraints, R’". 
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gradient of /, Vxfix*), is an element of the vector space spanned by the gradient vectors 
of Ca, VxCa{x*). Thus the FOC is that there exists a Lagrange multiplier vector, A*, in 
M™, such that at the point {x*, A*) G (g) the following conditions hold 


r Va£(F,A*) =Ca(F) =0, 

[ Vx C{x*,X*) = V. fix*) - Ea CaiS*) = 0. 


(2.5) 


Therefore, at least at the level of the FOC, the problem of constrained optimization has 
been reformulated as an unconstrained optimization problem. 


Second order condition (SOC). As is well-known, the condition that a first derivative 
vanishes is not sufficient to establish that there is an extremum at that point — one must 
also verify that the second derivative is, in the case of a minimum, positive. The extension of 
this idea to unconstrained optimization in many dimensions is to require that the Hessian 
matrix, defined for the objective function, /, by 


Hjkix*) 


dxjdxk ’ 


( 2 . 6 ) 


and evaluated at the prospective minimizer, x*, be positive dehnite®. This is the “second 
order condition” (SOC); that both the FOC and the SOC are satisfied is sufficient for the 
existence of a local minimum. As long as we are not at a boundary of the parameter space, 
these conditions are both necessary and sufficient. 

We now consider the corresponding SOC for the Lagrange multiplier method, hoping 
that, as in the case of the FOC, we can obtain a condition analogous to the case of uncon¬ 
strained minimization, i.e., a constraint involving the positive definiteness of some Hessian. 
We therefore evaluate the Hessian of the Lagrange function (2.4) with respect to x: 


iHc)jkix*,X*) 


a^£(F,A*) 

dxjdxk 


d^fi^) d'^Caix*) 

dxjdxk ^ “ dxjdxk 


(2.7) 


To determine the SOC, we note that for x* to be a minimizer (with Lagrange multiplier 
vector. A*), we must have 

(F Hcix*,X*)d > (2.8) 


for any infinitesimal displacement, d, from the proposed minimizer, x*, in a direction 
allowed by the constraints. 

Thus the relevant condition is not the positive definiteness of the Hessian (2.7), but the 
positive definiteness of the restriction of the Hessian to the space allowed by the constraints. 
The fact that we must still use the constraints to see if a given stationary point (i.e., a 
point satisfying the FOC) is a minimum, means that we have failed in our mission to 
convert the constrained minimization problem to an unconstrained minimization problem. 
We therefore proceed to look for methods for which the SOC does not explicitly involve 
the constraints. 

®In linear algebra, a symmetric nx n real matrix, M, is said to be positive definite if z'^Mz is positive 
for every non-zero column vector, z, of n real numbers. 
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2.2 Penalty methods 

A natural approach to our problem of transforming a constrained optimization problem to 
an unconstrained optimization problem is to follow the example of the method of Lagrange 
multipliers in modifying the objective function, but to do so in a different way. Clearly, we 
would like to modify the function so that infeasibility incurs a penalty. One possible way 
to achieve this is via “convexification” of the geometry near the desired solution point, i.e., 
making sure that a solution to the constrained minimization problem is a local minimum 
of the transformed function, f{x), even if it is not a local minimum of f{x) in the absence 
of constraints. We now proceed to give an example of one such convexification approach. 

In the so-called penalty methods, the original objective function, f{x), is modified 
by the addition of a penalty term, i.e., a functional of Ca{x) weighted by a positive penalty 
parameter, /i, so that the term vanishes when Ca{x) = 0, but becomes large if Ca{x) / 0 
in the /r —)• 0 limit. While there are various penalty methods, which differ in the form of 
the penalty function, we consider the “Quadratic Penalty Method” (QPM) here, because 
of its connection with the ALM discussed below. In the QPM, the penalty term is chosen 
so that the modified function^*^ under consideration is 

P{x-,fi) = f{x) + ( 2 - 9 ) 

a 

In the course of the algorithm, the parameter, p, will be reduced, as the desired properties 
of this function hold in the /i —?■ 0 limit. The gradient of (2.9) is 

VP(f;/r) = V/(x) +J]^Vca(x), (2.10) 

n ' 


which reproduces the gradient of the Lagrange function (2.5) if we take 

A,(^) = (2.11) 

/i 

to be (the components of) the Lagrange multiplier vector, A. This shows that the necessary 
FOC for a minimum is the same as for the method of Lagrange multipliers, only the 
Lagrange multiplier vector is now determined for us. Since a minimization using a QPM 
should give the same value for the solution, x*, as the Lagrange multiplier method (which 
gives the solution (x*, A*)), it follows that as we approach this solution, we must have 

Jim(-5i<£>)=A:. (2.12) 

X^X* \ /^ / 


thus ICa(x) I —)■ 0 as p —)■ 0. 

In determining the SOC for the QPM, we note that the Hessian of the function in (2.9) 
is 

UXjUXk fJ, \ J 


^°From here on, we shall use a semicolon, to separate the arguments of a function into two groups: 
independent variables, with respect to which an optimization is to be done, and fixed parameters. 
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If X* satisfies the constraints, then this expression simplifies to 


Hp{x*)jk 


dxjdxk 


H— 




(2.14) 


If we consider an inhnitesimal displacement, d, along the surface allowed by the constraints, 
then as 

d- Vca{x*) = 0, (2-15) 

we find that 

(F Hp{x*)d = d^ Hf{x*)d, (2.16) 

where Hf{x) is the Hessian (2.6) for the original objective function, i.e., djdkf{x). 

On the other hand, for displacements, d', orthogonal to this surface, we obtain non-zero 
terms from both the objective function and the penalty term, i.e. 

d’^Hp{x*) d! = d!^ Hf{x*) d! + - ^(Vca(f*) • d!f, (2.17) 

n 


where Vca(x) • d' is now non-vanishing. In the // —)• 0 limit, the second term will dom¬ 
inate. So if the Hessian is positive definite with respect to allowed displacements, it is 
automatically positive definite in general. Thus, we now have that in the limit of ^ 0, 

the sufficient condition that the stationary point, x*, be a local minimum is simply that the 
Hessian of the function (2.9) is positive definite. Thus we have succeeded in overcoming 
the limitation of the method of Lagrangian multipliers in that both the FOC and SOC are 
the same as in unconstrained minimization. 

Unfortunately, all is not well when it comes to practical applications of the QPM. 
The basic problem is that in the /r —)• 0 limit the algorithm becomes too sensitive to 
small departures from feasibility, hence numerical instabilities may prevent convergence 
to the solution. In more formal language, small values of /r —)• 0 can result in severe ill- 
conditioning of the Hessian, since the rightmost term in (2.14), which dominates in the 
// —)• 0 limit, is not invertible. Therefore, we will need to further modify the QPM, retaining 
the way in which it maps constrained optimization problems to unconstrained optimization 
problems, but reducing the relative importance given to feasibility in the /r —0 limit. The 
solution, presented in the next subsection, is the Augmented Lagrangian Method (ALM). 


2.3 Augmented Lagrangian Method 

We seek a method which preserves the success of the QPM in generating FOC and SOC that 
correspond exactly to those obtained in unconstrained minimization, but which overcomes 
the difficulties encountered by the QPM in the ^ 0 limit. One approach is to introduce 

augmented Lagrange multiplier terms, leading to the following modified objective 
function: 

C{x ; A, //) = f{x) - ^ \aCa{x) + 

n ' n 


10 




Note that A is no longer determined numerically by the optimization procedure, but is 
instead a fixed vector, just like the penalty parameter, /r. As was the case for the QPM, 
(2.18) is used iteratively; unconstrained minimization of £(x;A, is performed for the 
chosen values of A and /r in each step of the procedure. After optimizing C{x ; A, /r) to 
within some tolerance, new values of A and // are chosen; the process is repeated until the 
desired levels of optimality and feasibility are satisfied. This procedure is the ALM. 

We must verify that the ALM indeed avoids the problems of the QPM. To do this, we 
note that 

V£(f;A, ;,) = V/(x)-^AaVca(f) + ^-Vca(f), (2.19) 

a a ' 

while the Hessian is given by 




dxjdxk 


„ X/ f(^Ca)j(VCa)fc + Ca{x)V‘j ,^Ca{x) \ . 

a ^ ^ ^ a ^ ^ 


( 2 . 20 ) 


We note that (2.19) recovers the expression for the method of Lagrange multipliers (2.5) 
with the substitution 

Aa^A,-^, (2.21) 

which shows that the FOCs for optimality are the same as for the problem of unconstrained 
minimization, just as in the case of the QPM. It is also possible, albeit more challenging, to 
show that the SOC for a minimum is the same as in the unconstrained case; see refs. [82], 
[83], and [90] for details. 

We also conclude from (2.21) that in the asymptotic limit 

A:^A,-^!^. (2.22) 

fX 

in analogy to (2.12). The point of the ALM is that now that we are free to choose both Aa 
and fjL, we can enforce (2.22) without taking the ^ 0 limit. We shall do this by choosing 

the value of Aa in the {k + 1)®* iteration as follows 

= AJ - (2.23) 

As we do not take 0, the Hessian defined in (2.20) should never become ill-conditioned. 
Hence the ALM avoids the major drawback of the QPM, while preserving its successes. We 
therefore implement the ALM in the Optimass code, as described in the following section. 


3 The Optimass code 

Having reviewed both unconstrained and constrained minimization, we now state precisely 
how Optimass accomplishes the task of minimizing a mass function with constraints. The 
basic algorithm is presented in figure 1 and is the main subject of this section^^. 

^^Readers who are not interested in the details of the code may skip directly to the next section. 
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Figure 1. Flowchart of the minimization procedure in Optimass. 

3.1 Operational algorithm 

3.1.1 Step one: initialization 

At the onset, we must specify certain parameters. We first discuss the parameters associ¬ 
ated with the optimality condition, followed by those related to the feasibility condition. 
Finally, we will discuss the penalty/Lagrange multiplier parameter, and the starting value 
for the minimizer. 

Optimality condition. The test of optimality, i.e., whether the relevant sub-minimization 
procedure, has reached an acceptable local minimum in the iteration, should be per¬ 
formed in terms of some relevant geometric quantity. In OPTiMASS, we choose the Migrad 
algorithm of MiNUiT (see Appendix A.l), where the optimality criterion utilizes the so 
called “Estimated vertical Distance to the Minimum” (EDM) defined as follows 

EDM = ^ (^V£(xo)) ■ H~^{xo)-VCixo) = ^Ax'^ ■ H{xo) ■ Ax ^ f{x) - f{xo), (3.1) 
where 

Ax = X — xq = —H{x(})~^ ■ V£(xo). (3.2) 
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Using the EDM (3.1), at each iteration k in Optimass the optimality convergence test in 
Minuit is performed by checking if the EDM is smaller than the optimality tolerance, ujk'- 

EDM < LOk- (3.3) 

We have observed that simply setting to a constant, uj*, suffices. By default in Migrad, 
this constant is set by the internal re-parametrization 

OJ* = 0.001 • TOLERANCE • UP. (3.4) 

In Optimass, the optimality tolerance oj* is controlled and set by the tolerance param¬ 
eter through the interface with Migrad^^. Its default value is set to be 

TOLERANCE = 0.1, (3.5) 

while the parameter up is not used, it is set internally in Optimass to up = 1. 

Feasibility condition: Throughout the whole minimization procedure, we test for fea¬ 
sibility by computing the quantity^^ 

Mxk)\? ^^cl{xk) (3.6) 

a 

in each iteration. Then, the test for feasibility is 

\\c{xk)\\ < rjk, (3.7) 

where rjk denotes the feasibility tolerance in the iteration; this criterion evolves iteration- 
by-iteration, unlike the optimality parameter ujk- Although r/k eventually approaches zero 
as fe —?• oo, we set the final convergence criterion as follows: 

\\c{xk)\\<r]*, (3.8) 

where rj* denotes the terminal feasibility tolerance set to be 

rj* = 0.001 X M. (3.9) 

Here M is the appropriate typical mass scale associated with the target mass function, its 
value should be chosen depending on the specific physics process at hand. In the current 
version of OPTIMASS, the user is expected to provide the relevant fixed value of the scale 

^^The specific method to set this quantity is 

MnMigrad::operator()(unsigned int maxfcn, double tolerance), 

where maxfgn denotes maximum number of function calls after which the Minuit minimization routine 
will be stopped even if it has not yet obtained satisfactory convergence to an acceptable minimum. The 
default value is maxfcn = 5000. 

^®In eq. (3.6), we assume that all constraints Ca have the same mass dimension dim{ca) = dim{c) = 1, 
otherwise each term in the RHS should be raised to the appropriate power of l/dim{ca)- 
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M}^ The rules for updating rjk will be described in section 3.1.3. The starting feasibility 
tolerance is initialized by 


rjo = r]{mm [/ro, 7 ])^° 


Here the pre-factor rj is given by 


rj = a ■ rj 


(3.10) 

(3.11) 


with some coefficient a chosen in the range 

10 < a < 1000, (3.12) 

fiQ is the starting penalty parameter given in (3.16) below, while 7 G (0,1), and our default 
for it is 


7 = 0.2. (3.13) 

By adjusting the coefficient a within the range (3.12), one can control the relative scale 
of the initial feasibility tolerance, r/o, to its terminal value, t]* . This in turn determines 
the relative number of iterations within each of the two regimes (phase -1 and phase- 2 ) 
described in section 3.1.3 below. If a is too large, the ALM iterations in phase-1 terminate 
very quickly, and the majority of the ALM iterations are performed in the regime of phase- 
2, where the Lagrange-multipler driven evolution may not be efficient. On the other hand, 
if a is chosen to be too small, most of the ALM iterations will be done in the regime of 
phase-1, which only reduces the penalty parameter In that case, Optimass will not be 
able to take full advantage of the ALM method in avoiding the ill-conditioning as explained 
in section 2.2. We recommend that users test several different values of a, until the number 
of iterations in each phase is adequate and the results are stable. 

Finally, the “tightening” parameters, /3^ G (0,1), for the feasibility constraint are set 
to be 


/30 = 0.5, (3.14) 

= 0.3 {k > 1). (3.15) 

Penalty and Lagrange multiplier parameters: The penalty parameter, /r, and La¬ 
grange multiplier parameter vector, A, are updated in each iteration (the actual assignment 
rule will be explained below in section 3.1.3). Their starting values are 

/io = 0.1 (3.16) 

A° = 0 (a = l,--- ,m). (3.17) 

For the reduction of the penalty parameter p, Optimass introduces another parameter, 
G (0,1), defined in (3.20) and by default set to 

Tf, = 0.5. (3.18) 

^^One possibility is M = where dim{f{x)) is the mass dimension of f{x). 
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If the value of is too small, the penalty parameter n may decrease too quickly, causing 
strong convexification, which can lead to ill-conditioning. Conversely, if the value of is 
too large, one can experience slow convergence and a premature transition to phase-2 (see 
section 3.1.3 below). 

Initial minimizer and initial step size for Minuit: The initial guess for the minimizer 
of the objective function, Xq, (referred to as “seeding” in the flowchart of figure 1), is set 
by the invisible momentum configuration corresponding to Smin [31], i.e., the invisible 
momenta are such that the total invariant mass in the event is minimized. In addition, the 
initial step size, Axg, from from the Xq toward the final minimizer, x*, is another input 
parameter for the Minuit initialization^^. 

3.1.2 Step two: unconstrained minimization with Minuit 

Once the initial parameters have been chosen, we use the ALM mapping of a constrained 
minimization problem to an unconstrained minimization problem and then perform the 
latter minimization with Minuit (see Appendix A). In the process of this minimization, 
we perform an appropriate adjustment of parameters in each step of the algorithm. The 
code has two different options for minimization with Minuit in the iteration; 

1. Migrad: 

Using as a starting value the minimizer, Xk-i, obtained in the previous iteration, 
Migrad searches for a minimizer, x^q. 

2. Simplex and Migrad: 

Again using as a starting value the minimizer, Xk-i, obtained in the previous itera¬ 
tion, Simplex first finds a minimizer, x^^^, and then Migrad uses x^^s a starting 
value to find the final minimizer, Xk^ 2 - 

Among the two possible answers, x^^i and Xfc^ 2 , we choose as our minimizer, x^, the one 
which gives a lower value for the objective function. One could repeat either algorithm 1 
or algorithm 2 (or other minimization procedures) with various starting points, xq, to find 
a global minimum more accurately. However, we find that the combination of algorithms 1 
and 2 described above is adequate to obtain accurate on-shell constrained M 2 values, while 
keeping the computational effort to a minimum. Interestingly, we find that algorithms 1 
and 2 are complementary to each other. For example, in the M 2 calculations of section 4, 
we found that for the maximally constrained case of M 2 CC, the final solution, x^, was given 
by the answer x^q from algorithm 1 {xk ,2 from algorithm 2) in 83% (17%) of the events. 
This trend was reversed in the calculation of the minimally constrained case of M 2 XX, 
where algorithm 1 (algorithm 2) supplied the final solution x^ in 19% (81%) of the events. 
This behavior can be understood as follows. Migrad relies on gradient information, thus 

^®The initial input values for Xq and Axq can be set via the method 

MnUserParameters::Add(const char* par-name, double init-point, double init-step-size). 
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it can be fast and accurate if the objective function is smooth and continuous. On the other 
hand, Simplex does not require gradient information, and can handle more complicated 
functions (including “folds” and “creases”), although the ultimate accuracy is not as high. 
In the case of M 2 cc^ the objective function is convexified by the penalty terms, which 
makes the relevant geometry near the local minimum smooth and well-defined, thus we 
expect algorithm 1 by Migrad to outperform algorithm 2. However, it is known that the 
M 2 XX objective function, “the maximum of the two invariant masses”, develops a crease, 
on which the solution is found [77], and therefore one might expect algorithm 2 by Simplex 
to work better for this case. The performance of Optimass with regard to the M 2 variables 
will be discussed in more detail in section 4. 

3.1.3 Step three: ALM parameter adjustment 

Once the minimization routine has obtained a value of the minimizer, Xk, we then evaluate 
the constraints at this minimizer, i.e., Ca{xk)- Depending on the value of the feasibility 
(3.6), we define three phases: “Phase 1”, “Phase 2”, and “Phase 3”. While “Phase 3” is 
nothing but terminating the entire minimization procedure, the other two phases basically 
tighten the feasibility tolerance. Our tightening scheme is inspired by the LANCELOT 
package [91]. As mentioned earlier, the tolerance, Uk, for the optimality condition (3.3) is 
not evolved: ojk = co*. 

Phase 1. The feasibility condition is far from satisfied: 

||c(ffc)|| > max[r/fc,r/*] . (3.19) 

In this case we put more weight on the penalty term by reducing ^k+i for the next iteration. 
At the same time, the Lagrange multiplier vector, A^, remains unchanged. In the next 
iteration, the starting value of the minimizer, is set to be the minimizer obtained 


in the previous iteration, x^- Finally, the feasibility tolerance, %+!, 
detailed updating rules for Phase 1 are thus 

is also evolved. The 

t^k+l — Tfi ’ 

(3.20) 

\/c+l _ \k 

(3.21) 

^/c+1 ~ 

(3.22) 

Vk+i = fi{j ■ ■ 

(3.23) 


Phase 2. The feasibility condition is converging, but insufficient to terminate the algo¬ 
rithm: 

V* < \\cixk)\\ < Vk- (3.24) 

In this case, we do not reduce the penalty parameter, and instead adjust the values of 
the Lagrange multiplier vector by the rule in eq. (2.23). The starting value of the minimizer, 
^1+1) for the next iteration is set as in Phase 1. The feasibility tolerance is also modihed. 
The detailed updating rules are given by 

t^k+l ~ l^kj (3.25) 
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_ \k Ca(Xfc) 

(3.26) 


yk 

®l+i 

— 

(3.27) 


ofc+l 


yk+i 

= Vk ■ Kkli ■ 

(3.28) 


Phase 3. In this phase, we have achieved sufficient feasibility: 

\\c{xk)\\<v*. (3.29) 

We therefore break the sub-minimization loop and return Xk as the final value, x*, of the 
minimizer. 


3.2 Validation 

We now demonstrate the performance of the algorithm described in the previous subsec¬ 
tion with two simple examples, for which one can also obtain analytic solutions for the 
minimizer, x*, and the Lagrange multiplier vector, A*. The first example, considered in 
section 3.2.1, yields a well-defined solution at a unique global minimum. In the second 
example, treated in section 3.2.2, we find that the solution for the Lagrange multiplier 
vector is not well-defined because the relevant objective function is “folded” and is not 
differentiable at x*. The examples illustrate the evolution of the ALM parameters and 
demonstrate how the solution found in the iteration converges to the true value in 
terms of feasibility and optimality. 


3.2.1 Example one 

Our first example involves minimizing the objective function 


f{x,y)=x + y, (3.30) 

over the usual plane, x = {x,y), subject to the constraint 

x2 + y^-l = 0. (3.31) 


This constraint implies that our solution must lie on a unit circle centered at the origin. 
Clearly, the function (3.30) is minimized along the circle at the point 




^2 _y2\ 

2 ’ 2 y’ 


(3.32) 


which is the global minimizer in this example. 

The objective function, (3.30), is plotted in the left panel of figure 2. The locus of 
feasible points (i.e., the unit circle about the origin,) is shown in black. With the Lagrange 
multiplier method, one adds a Lagrange multiplier term to the objective function as in 
(2.4) 

C{x,y, X) = X + y - X{x‘^ + y"^ - 1). (3.33) 
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Optimized Augmented Lagrangian of f(x,y) = x + y 




Figure 2. Test of the ALM for the objective function f{x, y) = x + y subject to x"^ + y^ = 1. (a) 
Plot of the objective function (color coded) and the constraint curve (in black), (b) Contour plot 
of the augmented Lagragian (3.38) for the last (fifth) iteration. The magenta points denote the 
minimizers, Xk, found at each iteration. 


There are two stationary points given by 

(x*,y% A*) = ± 

The Hessian corresponding to the function (3.33) is 

I 0 -2A, 


\/2 \/2 \/2 
2 ’ 2 ’ 2 


(3.34) 


(3.35) 


and the condition for a minimum (i.e., that He should be positive definite) requires us to 
choose A < 0, which selects the correct minimizer among the two stationary points (3.34): 




^/2 



(3.36) 


conhrming the earlier result (3.32). 

We now wish to verify that the Optimass algorithm reproduces this solution. After 
running the code, we obtain 


= (-0.707106, -0.707106; -0.707180), (3.37) 

which is consistent with (3.36) {'\/2l2 = 0.707107...). We note that this convergence only 
required five steps, suggesting that the minimum was found relatively easily. The right 
panel of figure 2 shows a contour plot of the augmented Lagrangian, 

£(x,y;/i 5 , As) = (x+ y) - As (x^+ - 1) + ;^(x^+ - 1)^ with (3.38) 

2^5 
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Iteration Number (k) 





Figure 3. Tracing the convergence state of the solution and the evolution of relevant parameters 
of the ALM algorithm for the minimization problem depicted in figure 2. 

= (0.027,-0.707180), 

for the final minimization step {k = 5) at which the correct solution, (3.37), was obtained. 
The convexification due to the penalty term 

1 / 2 , 2 i\2 

+s -1) 

ensures that the solution of the constrained optimization problem is (at least) a local 
minimum. The magenta dots (connected by green lines) in the figure denote the solution, 
Xk, of the minimization, starting with a randomly chosen initial point, Xq = (0, 0.7). 

In the four panels of figure 3, we trace the evolution of several parameters of the 
algorithm as well as the properties of the approximate solutions, Xk- The upper left panel 
shows the evolution of the intermediate feasibility tolerance, (blue dashed line) and the 
real feasibility, ||cj(xfc)||, (red solid line) calculated at the end of each iteration, as well as 
the scale set by the ultimate feasibility tolerance, rj*, given by the horizontal black dashed 
line. We see that in the first two iterations ||cj(xfc)|| > t]^, and so we are in Phase l(brown 
shade). The onset of Phase 2 is marked by the crossing of the red solid and blue dashed 
lines; in iterations 3 and 4 we are in Phase 2. The terminal Phase 3 is entered when the 
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f(x,y) = lx + yl 


Optimized Augmented Lagrangian of f(x,y) = |x + y| 


0 . 

>> 

- 0 . 


Figure 4. The same as figure 4, but using (3.39) as an objective function instead. 

red solid line dips below the horizontal black dashed line marking the value of t ]*, and this 
finally occurs during the 5th iteration. 

The upper right and lower left panels in figure 3 respectively show the evolution of 
(the initial values of) the penalty parameter, and the Lagrange multiplier, A^, in each 
iteration. We see that during Phase 1, Hk was updated while was fixed, while in Phase 
2, Afc was updated while fik was held fixed. Throughout this process, the solution, Xk, from 
each step gradually converges to the analytic solution, (3.32), as shown in the lower right 
panel in figure 3. Note that from the first iteration on, the solutions are on the diagonal 
line X = y (see also the right panel in figure 2) — except for their starting values, the red 
and blue lines in the lower right panel of figure 3 essentially overlap. 

3.2.2 Example two 

Our second example is very similar to the one considered in the previous subsection, except 
now we change the objective function to 

f{x,y) = \x + y\, (3.39) 

and we keep the same constraint as before: 

x‘^ + y‘^-l = 0. (3.40) 

The left panel in figure 4 plots the objective function, f{x,y), as well as the feasible set 
(the unit circle centered on the origin). Note the “fold” along the line y = —x. On this 
line, the objective function is not a smooth function, hence one of the basic assumptions 
generally employed in the theory of constrained optimization (see section 2) is not satisfied. 
Nevertheless, in such cases we typically find that the Optimass algorithm still converges 
efficiently to the correct solution. 
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It is clear from figure 4 that the current problem has a two-fold ambiguity, there are 
two equivalent solutions for the global minimizer: 


(x*, y*) = 


\/2 \/2 


or 


y*) = - 


\/2 \/2 


2 2 / " ' \ 2 2 I 

Starting from the initial point (xq,7/q) = (0.6,0.7), Optimass converges to 

{x*,y*) = (-0.707132,0.707132) 


(3.41) 


(3.42) 


in a single iteration (i.e., without taking /r —>■ 0). The right panel in figure 2 shows a 
contour plot of the augmented Lagrangian, 

C{x, y; /io, Ao) = |x y| - Aq (x^ + y'^ - 1) + ^ {x'^ + y'^ - l)^ , (3.43) 

ZflQ 

for the only step that the algorithm needed, the initial step. The “folded valley” feature 
of the objective function suggests that one does not need much additional convexification 
from the penalty term (since we are already in a valley), and the algorithm converges very 
quickly. 

In both of these two toy examples, as well as in numerous physics motivated stud¬ 
ies described in the next section, we verified that the numerical solutions obtained with 
Optimass are stable with respect to small variations of the default initial values of the 
parameters, and in particular Xq. 


4 Calculating M 2 variables with Optimass 

In this section, we describe the main intended use of Optimass, the calculation of kinematic 
variables suitable for analyzing missing energy events at hadron colliders. The calculation 
involves a minimization of a mass function over a number of invisible particle momenta, 
subject to certain constraints (e.g., on-shell constraints, or the missing transverse momen¬ 
tum constraint (1-1)). In particular, we will show how the code can be used to calculate 
the recently proposed M 2 variables with non-linear constraints [26, 76, 77]. 

We first provide a brief review of the M 2 variables; then define the relevant objective 
function and identify the sorts of constraints that may be imposed. We then demonstrate 
the performance of Optimass in the calculation of M 2 in the physically important case of 
top quark pair production, when both tops decay leptonically. 

4.1 Introduction to M 2 

The M 2 variable [26, 76, 77] is a (3 -|- l)-dimensional analogue of the well-known Mt 2 
variable [57]. Both are typically applied to final states that may result from (a) the pair^® 
production of “mother” particles that (b) subsequently decay to both visible and invisible 
particles. The best motivated scenarios typically have too many invisible particles in the 
final state, so that we cannot, in general, reconstruct the masses or momenta of all of the 

^®Hence the subscript “2”. 
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Figure 5. The event topology for the decay process in eq. (4.1), together with the three possible 
subsystems. The blue dotted, the green dot-dashed, and the black solid lines indicate the subsystems 
(a), (&), and (ab), respectively. 

intermediate particles in the event with certainty. Both M 2 and Mt 2 are thus constructed to 
provide an ansatz for the invisible particle momenta. This ansatz involves the minimization 
of a suitably dehned kinematic mass function of the visible and invisible momenta in the 
event. Mt 2 , by construction, is restricted to the transverse plane, and does not involve the 
longitudinal momenta of the invisible particles. On the other hand, M 2 is not limited to 
the transverse plane, and thus provides an ansatz for the longitudinal invisible momenta 
as well. This ansatz can be usefully applied to particle mass reconstruction and event 
topology disambiguation [77, 78]: once we obtain values for the three-momenta of the 
invisible particles, we can work backwards to reconstruct the masses of the heavier particles 
produced in the intermediate steps of the relevant decay in terms of the hypothesized masses 
of the invisible particles in the final state. While the mass of such intermediate resonances 
is a priori unknown, in symmetric event topologies the two decay sides are identical (by 
definition), and we can impose the condition that the mass of an intermediate resonance 
of interest is the same in both decay chains. Adding such on-shell constraints further 
restricts the allowed solution space for the individual invisible momenta, leading in general 
to a different outcome from the procedure of minimization. Thus the imposition of different 
on-shell constraints leads to new, physically-motivated kinematic variables. 

Let us consider for concreteness a process in which a pair of heavy particles. A, un¬ 
dergo identical two-step, two-body, cascade decays, i.e., each A decays into two (massless) 
visible particles, a and b, plus a (massive) invisible particle, C, via an on-shell intermediate 
particle, B, (see figure 5) 

A^ a B ^ abC. (4.1) 

For simplicity, we assume that all visible particles are fully distinguishable and that particle 
a is emitted before particle 5, i.e., we do not address the combinatorial issues since they 
are not relevant for the current discussion of computing the M 2 variables. 

Given the event topology of figure 5, one can consider three different subsystem topolo¬ 
gies, depending on which of the particles along the red dashed lines are treated as mothers 
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and which are treated as daughters [73, 77] . For example, considering the event as a whole 
corresponds to subsystem (ab), in which Ai are the mothers, Ci are the daughters, while Bi 
are intermediate resonances, dubbed “relatives” in [77]. We are interested in placing the 
maximum possible lower bound on the mass of A, as a function of the hypothesized mass, 
m, of C. The prescription for doing so is well-known (see, e.g., [26]): we minimize of the 
heavier of the two parent masses, and subject to relevant kinematic constraints. 
In the simplest case, we only apply the missing transverse momentum constraint, (1.1), 
and obtain 


M2(m) = min{max[MAi(gi,m), M^2(g2,m)]}, (4.2) 

Ql,Q2 

QlT + Q2T = fr 


where qt denotes the three-momentum of the invisible particle, Ct. Note that the missing 
transverse momentum condition is linear and is easily solved, so that the minimization 
in (4.2) is unconstrained and can be performed over an unconstrained four-dimensional 
momentum space, e.g., {qiT,qiz,q 2 z}- 

The situation becomes much more interesting (and challenging) when we consider 
additional nonlinear constraints. Given the process of figure 5, it is natural to consider 
additionally constrained versions of eq. (4.2). Having already made the assumption that 
the two decay chains are identical^^, we can additionally impose that the particles Ai and 
A 2 have the same mass. 


Ma^=Ma2, (4.3) 

that the particles Bi and B 2 have the same mass, 

Mb,=Mb„ (4.4) 

or both (4.3) and (4.4). Together with the case where neither (4.3) or (4.4) is required 
to hold, a total of four variants are therefore possible. Following the same notation as 
ref. [77], we introduce two more subscripts on the M 2 variable to indicate whether the 
constraints in eqs. (4.3) and (4.4) were applied during the minimization or not. The first 
subscript will refer to the parent constraint in eq. (4.3), while the second subscript will 
refer to the relative constraint in eq. (4.4). If a constraint is imposed, the corresponding 
index is “C”, otherwise it is “X”. Therefore, eq. (4.2) can be expressed as M 2 XX because 
no extra constraints are imposed: 

M 2 xx{'m) = min{max[MAi(gi,m), 114 ^ 2 (^ 2 , hr)]} , (4.5) 

qiT + q2T = fr- 

The other three variables are formally defined as follows: 

M2cx{'m) = min{max[MAi(gi,m), MA^{q2,rh)]} , (4.6) 

91 >92 

^^See refs. [60, 61, 78] for relaxing this assumption. 
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qiT + q2T = fr 


Ma, 

= Ma, 



M2xc{rh) 

= min {max [M^i^ (gi, m). 

91 >92 

< MA2{q2,rh)]} , 

(4.7) 

QlT + 92T 

= 



Ml 

= Ml 



M2ccim) 

= min {max [Ma-^ (^1 , rh) 

91:92 

, MA2{q2,'fh)]} . 

(4.8) 

QlT + q2T 

= fr 



Ma, 

= Ma, 



Ml 

= Ml 




Eqs. (4.5-4.8) define the four possible M 2 variables for the {ab) subsystem. One can sim¬ 
ilarly define four M 2 variables for each of the (a) and (b) subsystems, we refer the reader 
to [77] for the exact definitions. 

4.2 Calculating M 2 for dilepton top events 

From the definitions (4.6-4.8) it is clear that the problem of computing the variables M 2 CX, 
M 2 XC) and M 2 CC falls into the general category of constrained minimization problems 
(2.1, 2.2) which Optimass is designed to solve. In the remainder of this section, we shall 
therefore illustrate the functionality of Optimass with the physics example of figure 5. 
Specifically, we consider the case of pair-produced top quarks that decay fully leptonically: 

pp —> ti, {t —^ bW~^ —> bl'^vi), {i —> bW~ —^ bl~L>i). (4.9) 

Thus, in figure 5, particle A is associated with the top quark, particle B with the IT-boson, 
and particle C with the neutrino. For simplicity, since our major interest is not in the shape 
of the distributions but in the precision of the minimization procedure, we consider events 
where the top quarks are produced at threshold and decay according to phase space. We 
neglect initial and final state radiation, and also do not take into account experimental 
efficiencies, cuts, combinatorics, and detector resolution. All those effects are important in 
a real physics analysis, but are irrelevant to the question of evaluating the performance of 
the Optimass minimization algorithm, which is our goal here. 

We use Optimass to compute the values for the M 2 cx^ ^ 2 X 0 ^ and M 2 CC variables for 
all three subsystems. In order to judge the precision of this numerical calculation, ideally 
we need to identify an alternative method for computing these answers, which would give 
us reference benchmarks. Fortunately, for the case of the M 2 CX variable, such a benchmark 
is provided by the Mt 2 variable itself — we can use the result, proven in ref. [77], that 
M 2 CX and Mt 2 are identical event by event. This enables us to directly compare the values 
of Mt 2 and M 2 CX for each of the three possible subsystems. The Cambridge variable Mt 2 
can already be reliably computed with one of several publicly available codes; here we 
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Sobsystem (ab): m=0 GeV 



Subsystem (a): m=80 GeV 



Subsystem (!>): m=0 GeV 
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Mt 2 or Mzcx (GeV) 



Mt 2 or 3/; cx (GeV) 


Figure 6. Comparison between Mt 2 (blue shaded histograms) and M 2 CX (red histograms) for 
each subsystem. For the (ab) and (a) subsystems, the relevant Mt 2 values are evaluated by the 
well-known analytic formula, whereas those for the (b) subsystem are computed numerically with 
the package from ref. [79]. The vertical black dashed lines indicate the expected endpoint of the 
Mt 2 distribution for the given test mass. 
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Figure 7. Scatter plots of the difference M 2 CX — versus the reference value, 


use the package from ref. [79]. Furthermore, for the {ab) and (a) subsystems, analytical 
formulae for Mt 2 are also available [69, 72], facilitating the comparison.^® 

Figures 6 and 7 show the results from the comparison between the value of M 2 CX 
obtained from Optimass and the corresponding reference value, for all three sub¬ 

systems. Since the exercise is performed with the tt decay sample, we take the trial mass to 
be the true mass of the daughter particle: m = 0 GeV for the {ab) and the (5) subsystems 
and m = 80 GeV for the (a) subsystem. We choose the M parameter in eq. (3.9) to be 200 
GeV for subsystems {ab) and (a), and 100 GeV for subsystem (6). Figure 6 reveals that 
the distributions of the on-shell constrained M 2 variable, M 2 CX, (red dashed histograms) 
are almost identical to the corresponding Mt 2 distributions (blue dot-dashed shaded his- 

^®In the case of the (6) subsystem, the 6-quarks simulate initial state radiation, in which case no analytical 
formula for Mt 2 is known, so we need to rely on the computer code from ref. [79]. 


- 25 - 








































Subsystem {ab): GeV 



Subsystem (</): /7i=80 GeV 


Subsystem (b): nt=0 GeV 


80 100 120 140 160 180 

Mj yc [Ge\ l 


0 20 40 60 80 

w?Ar [GeA'l 


Figure 8. The comparison between the values of M2XC obtained by two different internal codes. 
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Figure 9. The same as figure 8 , but for M2cc- 


tograms). Only a handful of events show a difference on the order of 1 — 2 GeV, as seen 
in the scatter plots of figure 7. The results shown in figures 6 and 7 were obtained with 
the default values of the Optimass parameters. Of course, the precision can be further 
improved by tweaking the relevant tolerance parameters, increasing the maximum number 
of iterations, or improving on the initial guess of Xq. However, this will come at the cost 
of increased computation time; we believe that the level of precision seen in these figures 
should be sufficient for most practical analyses. 

Figures 8 and 9 provide a similar validation for the case of the M 2 XC and M 2 CC vari¬ 
ables. In this case, however, we do not have a readily available benchmark for comparison: 
first, because analytical formulas for those cases do not exist, and second, because there 
is no publicly available code which is able to handle M 2 XC and M 2 cc- This is why we 
produced two different versions of our code (created independently by different sets of the 
current authors), and proceeded to compare their results for M 2 XC and M 2 CC ia figures 8 
and 9, respectively. The figures show that the two internal codes agree reasonably well, with 
notable differences only in about 1% of the events. The events with the largest deviations 
were scrutinized further, revealing that one of the codes typically found a local minimum, 
due to a different choice of starting values for Xq. When repeating the minimization with 
a range of possible choices for Xq, and taking the minimum of the obtained set of values of 
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Figure 10. The same as Fig. 3, but for the M2CC calculation in the {ab) subsystem of the single 
event considered in section 4.3. 

M 2 , the two codes were shown to be in exact agreement. 

4.3 Demonstration of Optimass for one event 

In conclusion, we supplement the toy examples from section 3.2 with one example of a real 
tt event. The 4-momenta {E,px,Py,Pz) of the four visible particles (in GeV) are 

Pal = (68.003, -8.404, 16.069, -65.541) 

Pbi = (56.168, -29.282, -29.683, 37.635) (4.10) 

= (68.003, 6.881, -56.711, -36.890) 

Pb2 = (81.160, -27.332, 68.553, 33.769), 

thus the missing transverse momentum is 

fr = (58.137,1.772). (4.11) 

The initial parameters for the algorithm are given by 

Qtyfi ,qUo,qL,o) = (-40,+40,-40,+40) (GeV), (A?,A0) = (0, 0). 
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Figure 11. The same as figures 2 and 4 , but for the single event considered in section 4 . 3 . Since 
the objective function has four independent arguments, in order to visualize the evolution of the 
minimizer, we plot qiz and 522, having fixed the other two variables, qix and qiy, to the values 
which minimize the objective function for the given choice of qiz and 922- 


Figure 10 is the analogue of Fig. 3, showing the convergence to a satisfactory solution 
for M 2 CC in subsystem {ah) after the k = A step, giving 

= (38.082, 5.612, 26.598, -8.717) (GeV), 

(A^*,Af) = (0.000001,-122.329803), (4.12) 

= 0.025. 

Figure 11 is analogous to figures 2 and 4 for this case. The left panel plots the original 
objective function, while the right panel shows a contour plot of the augmented Lagrangian 
function as of the final {k = 4) iteration. In the right panel, the set of points which satisfy 
the constraint eq. (4.3) (eq. (4.4)) is shown in blue (red). As before, the magenta points 
mark the locations of the minimizer, found in the k^'^ iteration. 

5 Conclusions 

With the restart of the LHC, the quest for new physics has resumed. We believe that 
kinematic variables like M 2 will play an increasingly important role in searching for SUSY 
and related models; the gain in sensitivity that these variables provide (see, e.g., [78]) aids 
both in setting limits on and in discovering BSM physics. 

Since the calculation of M 2 , like many other kinematic variables, involves a constrained 
minimization that must be performed numerically, it is important to ensure that this 
calculation is performed in an efficient and reliable way. Thus we have introduced the 
public package, Optimass, which achieves these important goals. Our algorithm utilizes 
the ALM and interfaces with the popular unconstrained minimization package, Minuit. 
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Figure 12. An example of the general event topology which can be handled by Optimass. We 
allow for two decay chains, involving a sequence of 2-body or 3-body decays. In each decay, the 
final state particles can be visible, invisible, or both. 


We have described the relevant issues in what we hope will be sufficient depth to aid 
the physicist new to the challenges of constrained optimization. The Optimass algorithm 
has been described in detail and examples of its use have been provided. We compared 
analytically calculated values of Mt 2 to the M 2 CX variable obtained using Optimass and 
found excellent agreement. Other tests of Optimass were performed and the results are 
encouraging. We stress that while our physics example in section 4 was limited to the 
dilepton tt topology of figure 5, Optimass has been designed to handle arbitrarily general 
event topologies, as indicated in figure 12: 


• Multiple invisible particles in each decay chain. In many motivated scenarios, invisible 
particles may appear not only at the end of the decay chain (as is customary for dark 
matter particles), but also at intermediate stages. The Optimass code can handle 
such cases, since the total number of invisible particles is unrestricted. In figure 12, 
the first decay in the lower chain and the second to last decay in the upper chain 
provide examples of sources of such intermediate invisible particles. 


• Multi-body decays. The decay chains can be constructed of two-body decays, three- 
body decays, etc. Furthermore, a multi-body decay may result in a set of final state 
particles which can be visible, invisible, or both. As an illustration, in figure 12 
we show three two-body decays and three three-body decays. Two of the three-body 
decays result in visible particles only, while the remaining one produces both a visible 
and an invisible final state particle. 


This more general functionality of Optimass will be explored and demonstrated in a future 
publication [101]. 

In conclusion, we look forward to implementing improvements to and extensions of the 
Optimass framework, and to its use in the search for new physics that lies ahead. 
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A Unconstrained minimization with Minuit 


Using the methods of section 2, and in particnlar the ALM, we can convert a constrained 
optimization problem into an unconstrained minimization problem; the next step is to 
actually solve the resulting unconstrained minimization problem. For this task we use 
Migrad and Simplex, two algorithms which are part of the Minuit library. Thus, after 
briefly introducing this ubiquitous library, we will discuss these algorithms in sections A.l 
and A.2, respectively. 

Minuit [84] is a popular function minimization library. It is often used for data 
analysis, as the minimization of functions and likelihoods represents perhaps the main 
use of minimization in experimental particle physics. Minuit was initially written in 
Fortran, but has been reimplemented (as Minuit2 [92]) in CH—h, taking advantage of 
its object-oriented features; Minuit2 is included in the math library of the omnipresent 
data analysis package Root [93]. Minuit and Minuit2 (which we will henceforth refer to 
simply as “Minuit”) contain various minimization algorithms, offering the user a choice. 
Among the several main algorithms (Migrad, Seek, Sgan, Simplex), we choose to use 
Migrad and Simplex which are briefly described in the next two subsections. 


A.l Migrad 


Migrad utilizes a variation of the Newton’s method called a Variable Metric Method 
(VMM) [94]. We remind the reader that Newton’s method is an iterative method for 
finding the root of a function /(x) in which 


Xn+l Xn — X 

f [Xn) 


(A.l) 


Finding a minimum of f{x) rather than a root corresponds to finding a zero of f'{x). In this 
case the sequence of approximate solutions obtained by Newton’s method are described by 


^n+1 — 


f{Xn) 

f'iXn)' 


(A.2) 


The analogous expression in the multidimensional case is 


Xn+l Xn — H (x^)V/(X}^), 


(A.3) 


where is the inverse of the Hessian matrix. The name “Variable Metric Method” 

is due to an interesting parallel with General Relativity. Namely, in the limit where the 
objective function, /(x), is a quadratic form with minimum at x = 0, then, for small x, 

/(x) -/(O) Ri x^ Qr(x)^ X. (A.4) 

The expression on the right hand side is a bilinear form with (I/2)R(x) playing the role 
of the metric tensor. If one chooses 

n 

f{x) = ^xj (A.5) 

i=l 
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in n dimensions, then {l/2)H{x) is precisely the n-dimensional Euclidean metric. The 
quantity on the right hand side of eq. (A.4) is known as the “estimated vertical distance 
to minimum” (EDM), i.e., 

EDM = (A-6) 

Generally, when using a VMM, the optimality condition will check whether the calculated 
value for the EDM exceeds a certain tolerance parameter. 

Eq. (A.3) describes the essential idea of the VMM. However, the VMM is a “quasi- 
Newton method” (as opposed to Newton’s method itself) because instead of calculating the 
Hessian (or “metric”) exactly, it approximates it iteratively. The main differences among 
different VMM algorithms lie in the precise form of this iterative approximation procedure. 
The first VMMs used the so-called DEP updating formula [94, 95] (after Davidon, Eletcher, 
and Powell). Currently, the most common algorithms, including MiGRAD, use the BEGS 
method [96-99]. 

A very useful property of VMMs is that subsequent steps are in “conjugate” directions, 
i.e., in orthogonal directions with respect to the metric provided by the Hessian; addition¬ 
ally, convergence to the minimum is efficient. Thus, the algorithm rarely crosses the folded 
region of the M 2 -objective function where the gradient and Hessian are not defined. This 
fact motivates the use of the Migrad implementation of the VMM for our constrained M 2 
calculations. 

A.2 Simplex 

Unlike VMMs, the downhill simplex (or Nelder-Mead) method [100], does not require the 
calculation of gradients. Instead, one calculates the values of f{x) at the n -|- 1 vertices of 
a simplex, a non-degenerate solid in n dimensions with n -|- 1 sides and n -|- 1 vertices. A 
new vertex for the simplex is generated in each iteration of the method. If the value of the 
objective function at the new point is lower than the value at one of the existing vertices, 
the worst vertex is replaced by the new point. In this way, the volume of the simplex 
becomes smaller; the algorithm stops when the simplex, now enclosing the minimum, has 
shrunk to a specified size. 

To be more concrete, let us first consider a (large) simplex of n-|- 1 points in n dimen¬ 
sions, with vertices 


PliP2i • • •) Pn: Pn+1 ■ 


(A.7) 


These points are ordered so that 


/l < /2 < ... <fn< fn+1, 


(A.8) 


where fi = f{pi). We define the “center of mass”, p, using all points except Pn+i as follows, 





(A.9) 
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In each step, the algorithm tries to replace the worst point, Pn + i - First, a new test point, 
Prj is obtained by reflection of the worst point about the center of mass, 

Pr=p + a { p - pn + i ), (A. 10) 

for some typical value of the expansion factor (generally a = 1 ; this is the value we shall 
use in our use of Simplex). A new point is then determined using the value of f { pr ) as 
follows: 

!• /i < /r < /n: The previous worst point, Pn+i, is replaced by pr, and the points are 
relabeled in accordance with (A. 8 ). 

2. /r < /i: The test point, pr, is the best point, so the current search direction is con¬ 
sidered to be effective. We therefore shift the first n points 

Pl ^ P 2 , P 2 ^ PZ ,-; Pn ^ Pn + l - (A.H) 

To determine the new value for pi, we try one additional point, psi = P + f 3 {pr — p ) 
(typically (3 = 2), and evaluate its functional value, fsi- If /si < /r, we set pi = psi, 
otherwise pi = Pr- 

3- fr > fn' The simplex may be too big and therefore its size must be reduced. If 
fr > fn+i, then a new contracted simplex is defined around the best point, pi, by 
replacing all points except pi by Pi = pi + 6{pi — pi) with 0 < 5 < 1 (typically 
5 = 0.5). If fn < fr < fn+i, then Pn+i is replaced by A test of the new inner 

point, Ps 2 = P — liP — Pn+i) (typically 7 = 0.5), is then performed, and Pn+i is 
replaced by Ps 2 if fs 2 < fn+i- 

Since the simplex method is always designed to take as big a step as possible, it is rather in¬ 
sensitive to shallow local minima and other small-scale structures in the objective function. 
Thus, we use the method to identify promising candidates for global minima. Once the 
location of a possible global minimum has been identified, the MiGRAD algorithm described 
above is used to obtain a more precise value of any local minimum in this area, hopefully 
obtaining an accurate value for the location of the global minimum. The downhill simplex 
method is implemented in Minuit using the Simplex algorithm. 

B Installation and user instructions 

The latest version of the Optimass has been developed and designed to achieve the au¬ 
tomation of kinematic mass function minimization with constraints for general particle 
decay system. In particular, 

1. It has generality to treat various decay topologies from multiple parent particles 
where in general the multiplicity can be larger than two. 

2. It also has generality to include decay vertices where in general the multiplicity of 
branch legs can be larger than three. 
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3. It has flexibility to easily define a specific sub-system of intermediate parent particles 
with effective invisibles, in the full decay system. 

4. It also has flexibility to define kinematic constraint functions of user’s own interest, 
in terms of visible and invisible particles’ momentum degrees of freedom. 

5. All these generality and flexibility can be initiated from user’s simple model card file 
which defines 1) full decay process with user’s own particle label scheme, 2) parent 
node particle in each decay chain, 3) effective invisible nodes in each decay chain, 
and 4) constraint functions of particle momenta which can be interactively expressed 
by the user’s particle label scheme. 

The Optimass is free software written in C++ and Python under the copyleft of the 
GNU General Public License. The latest version of the Optimass can be downloaded from 
the following web page ; 

http://hep-pulgrim.ibs.re.kr/optimass 

More detailed Optimass installation guide and the tutorial with examples on how to run 
the code implementing user’s own decay topologies, can be found on the webpage as well. 
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